Skip to content

Add multi-period DiD support#6

Merged
igerber merged 1 commit into
mainfrom
claude/add-multi-period-support-OhFL8
Jan 2, 2026
Merged

Add multi-period DiD support#6
igerber merged 1 commit into
mainfrom
claude/add-multi-period-support-OhFL8

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented Jan 2, 2026

Extend the library to optionally work across multiple time periods:

  • Add MultiPeriodDiD estimator that handles multiple pre/post periods
  • Add MultiPeriodDiDResults for period-specific and average ATT
  • Add PeriodEffect dataclass for individual period treatment effects
  • Support for period dummies and treatment × period interactions
  • Compute average ATT with proper covariance-adjusted standard errors
  • Allow custom reference period selection
  • Auto-infer post-periods when not specified (last half of periods)
  • Full support for covariates, fixed effects, and absorbed FE
  • Cluster-robust and heteroskedasticity-robust standard errors
  • Comprehensive test suite with 28 new tests (70 total passing)

Extend the library to optionally work across multiple time periods:

- Add MultiPeriodDiD estimator that handles multiple pre/post periods
- Add MultiPeriodDiDResults for period-specific and average ATT
- Add PeriodEffect dataclass for individual period treatment effects
- Support for period dummies and treatment × period interactions
- Compute average ATT with proper covariance-adjusted standard errors
- Allow custom reference period selection
- Auto-infer post-periods when not specified (last half of periods)
- Full support for covariates, fixed effects, and absorbed FE
- Cluster-robust and heteroskedasticity-robust standard errors
- Comprehensive test suite with 28 new tests (70 total passing)
@igerber igerber merged commit 88c6542 into main Jan 2, 2026
@igerber igerber deleted the claude/add-multi-period-support-OhFL8 branch January 2, 2026 15:40
igerber pushed a commit that referenced this pull request Jan 4, 2026
Revised review reflects:
- #1, #4 verified as non-issues (correct by design)
- #3, #5, #6, #8, #13 addressed in commit e40d6b4
- Updated recommendation to approve and merge
- Remaining items are low-priority style suggestions for future PRs
igerber added a commit that referenced this pull request Apr 18, 2026
Addresses axis B findings #6 and #7 from the silent-failures audit:
trop_global.py:448 outer alternating-min loop, trop_global.py:466
hard-coded range(20) inner FISTA loop, and trop_local.py:680
alternating-minimization loop all exited silently on max_iter
exhaustion, returning the current iterate as if converged.

- trop_global._solve_global_with_lowrank: thread a converged flag through
  the outer loop; count non-convergence events from the inner FISTA and
  surface the count in the outer warning for diagnostic context. One
  warn_if_not_converged call per solver invocation.
- trop_local._estimate_model: thread a converged flag through the outer
  alternating-min loop; call warn_if_not_converged on exhaustion.
- REGISTRY updated under TROP.

New TestTROPConvergenceWarnings class (4 tests) exercises both global
and local paths with forced non-convergence (max_iter=1, tol=1e-15)
and a convergent negative control.

Notable: the default TROP local config (max_iter=100, tol=1e-6) does
not converge within max_iter on typical synthetic panels, so this PR
surfaces a previously silent non-convergence that affected routine
user fits. No numerical change in the returned iterate; the warning
is additive.

Axis-B regression-lint baseline: 5 -> 2 silent range(max_iter) loops
remaining (minor loops in honest_did/power not yet addressed).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 19, 2026
P0: bias_corrected_local_linear now routes the CI through
`safe_inference()` so degenerate cases with `se_robust <= 0` or
non-finite `se_robust` (e.g., exact-fit / constant-y) return
`(NaN, NaN)` rather than a misleading zero-width or infinite CI.
Matches the repo-wide inference contract (CLAUDE.md Key Design
Pattern #6).

P1: Auto-bandwidth path now calls `lpbwselect_mse_dpi` directly
with `cluster`, `vce`, and `nnmatch` forwarded. Previously it went
through `mse_optimal_bandwidth` which hard-codes unclustered / nn /
nnmatch=3, silently mismatching the downstream `lprobust` fit's
reported estimator.

Tests added: TestNaNSafeCI (constant-y + near-zero-SE) and
TestAutoBandwidthForwardsParameters (cluster+auto, vce='hc1'+auto,
nnmatch=5+auto), all asserting the selected bandwidth changes when
the corresponding parameter changes (catches silent fallback).

Also: suppress spurious BLAS FPE warnings in lprobust_bw's hc/hc2/hc3
branch (numpy issue #21432 pattern), newly reachable via the
wired-through vce='hc1' auto-bandwidth path.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 22, 2026
Close BR/DR gap #6: target-parameter block in schemas
igerber added a commit that referenced this pull request Apr 26, 2026
Wave 3 #6+#7 of the dCDH by_path follow-up sequence (after PR #378
shipped #5 by_path + controls). Removes the two NotImplementedError
gates at chaisemartin_dhaultfoeuille.py:1014-1023 and adds:

- New `path_cumulated_event_study` Results field surfacing the cumulated
  level effect delta_l per path under trends_linear=True (mirrors the
  global linear_trends_effects cumulation; this is what R returns under
  did_multiplegt_dyn(..., by_path, trends_lin)).
- `set_ids` parameter threaded through the four per-path IF helpers
  so trends_nonparam's set-restricted control pool reaches per-path
  analytical SE, bootstrap, placebos, and sup-t bands automatically.
- to_dataframe(level="by_path") gains cumulated_effect / cumulated_se
  columns (always present, NaN-when-None — mirrors cband_*).
- summary() renders a per-path "Cumulated Level Effects" sub-section.

Validated against R DIDmultiplegtDYN 2.3.3 via two new golden scenarios:
- single_baseline_multi_path_by_path_trends_lin (custom DGP: F_g >= 4,
  cohort-single-path, n_periods=13) — per-path cumulated point estimates
  match R bit-exactly (POINT_RTOL=1e-9), cumulated SE within ±20%
- multi_path_reversible_by_path_trends_nonparam — per-path point
  estimates AND placebos match R bit-exactly, SE within ±15%

Placebo parity for trends_linear is intentionally skipped: R's per-path
placebo re-runs on the path-restricted subsample with different control
eligibility than Python's global-then-disaggregate architecture, so the
divergence is methodological, not a tolerance issue. Internal regression
covers placebo + trends_linear (finite values, bootstrap inheritance).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 26, 2026
Wave 3 #6+#7 of the dCDH by_path follow-up sequence (after PR #378
shipped #5 by_path + controls). Removes the two NotImplementedError
gates at chaisemartin_dhaultfoeuille.py:1014-1023 and adds:

- New `path_cumulated_event_study` Results field surfacing the cumulated
  level effect delta_l per path under trends_linear=True (mirrors the
  global linear_trends_effects cumulation; this is what R returns under
  did_multiplegt_dyn(..., by_path, trends_lin)).
- `set_ids` parameter threaded through the four per-path IF helpers
  so trends_nonparam's set-restricted control pool reaches per-path
  analytical SE, bootstrap, placebos, and sup-t bands automatically.
- to_dataframe(level="by_path") gains cumulated_effect / cumulated_se
  columns (always present, NaN-when-None — mirrors cband_*).
- summary() renders a per-path "Cumulated Level Effects" sub-section.

Validated against R DIDmultiplegtDYN 2.3.3 via two new golden scenarios:
- single_baseline_multi_path_by_path_trends_lin (custom DGP: F_g >= 4,
  cohort-single-path, n_periods=13) — per-path cumulated point estimates
  match R bit-exactly (POINT_RTOL=1e-9), cumulated SE within ±20%
- multi_path_reversible_by_path_trends_nonparam — per-path point
  estimates AND placebos match R bit-exactly, SE within ±15%

Placebo parity for trends_linear is intentionally skipped: R's per-path
placebo re-runs on the path-restricted subsample with different control
eligibility than Python's global-then-disaggregate architecture, so the
divergence is methodological, not a tolerance issue. Internal regression
covers placebo + trends_linear (finite values, bootstrap inheritance).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request Apr 26, 2026
dCDH by_path: lift trends_linear + trends_nonparam gates (Wave 3 #6+#7)
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request Apr 27, 2026
Closes BR/DR foundation gap igerber#6 from project_br_dr_foundation.md:
BusinessReport and DiagnosticReport now name what the headline
scalar actually represents as an estimand, for each of the 16
result classes. Baker et al. (2025) Step 2 ("define the target
parameter") was previously in BR's next_steps list but not done
by BR itself — this PR closes that gap.

New top-level ``target_parameter`` block (additive schema
change; experimental per REPORTING.md stability policy):

  {
    "name": str,               # stakeholder-facing name
    "definition": str,         # plain-English description
    "aggregation": str,        # machine-readable dispatch tag
    "headline_attribute": str, # which raw result attribute
    "reference": str,          # REGISTRY.md citation pointer
  }

Schema placement: top-level block (user preference, selected via
AskUserQuestion in planning). Aggregation tags include "simple",
"event_study", "group", "2x2", "twfe", "iw", "stacked", "ddd",
"staggered_ddd", "synthetic", "factor_model", "M", "l", "l_x",
"l_fd", "l_x_fd", "dose_overall", "pt_all_combined",
"pt_post_single_baseline", "unknown".

Per-estimator dispatch lives in the new
``diff_diff/_reporting_helpers.py::describe_target_parameter``
(own module rather than business_report / diagnostic_report to
avoid circular-import risk — plan-review LOW igerber#7). All 17 result
classes covered (16 from _APPLICABILITY + BaconDecompositionResults);
exhaustiveness locked in by
TestTargetParameterCoversEveryResultClass.

Fit-time config reads:

- ``EfficientDiDResults.pt_assumption`` branches the aggregation
  tag between pt_all_combined and pt_post_single_baseline.
- ``StackedDiDResults.clean_control`` varies the definition clause
  (never_treated / strict / not_yet_treated).
- ``ChaisemartinDHaultfoeuilleResults.L_max`` +
  ``covariate_residuals`` + ``linear_trends_effects`` branches
  the dCDH estimand between DID_M / DID_l / DID^X_l /
  DID^{fd}_l / DID^{X,fd}_l.

Fixed-tag branches (per plan-review CRITICAL igerber#1 and igerber#2):

- ``CallawaySantAnna`` / ``ImputationDiD`` / ``TwoStageDiD`` /
  ``WooldridgeDiD``: the fit-time ``aggregate`` kwarg does not
  change the ``overall_att`` scalar — it only populates
  additional horizon / group tables on the result object.
  Disambiguating those tables in prose is tracked under gap igerber#9.
- ``ContinuousDiDResults``: the PT-vs-SPT regime is a user-level
  assumption, not a library setting. Emits a single
  "dose_overall" tag with disjunctive definition naming both
  regime readings (ATT^loc under PT, ATT^glob under SPT).

Prose rendering:

- BR ``_render_summary``: emits "Target parameter: <name>."
  after the headline sentence (short name only; full definition
  lives in the full_report and schema).
- BR ``_render_full_report``: "## Target Parameter" section
  between "## Headline" and "## Identifying Assumption".
- DR ``_render_overall_interpretation``: mirror sentence.
- DR ``_render_dr_full_report``: "## Target Parameter" section
  with name, definition, aggregation tag, headline attribute,
  and reference.

Cross-surface parity: both BR and DR consume the same helper
(the single source of truth), so their ``target_parameter``
blocks are byte-identical (verified by
TestTargetParameterCrossSurfaceParity).

Tests: 37 new (TestTargetParameterPerEstimator +
TestTargetParameterFitConfigReads +
TestTargetParameterCoversEveryResultClass +
TestTargetParameterCrossSurfaceParity +
TestTargetParameterProseRendering). Existing BR/DR top-level-key
contract tests updated to include ``target_parameter``. Total
319 tests pass (282 prior + 37 new).

Docs: REPORTING.md gains a "Target parameter" section
documenting the per-estimator dispatch and schema shape.
business_report.rst and diagnostic_report.rst note the new
field with a pointer to REPORTING.md. CHANGELOG entry under
Unreleased.

Out of scope: REGISTRY.md per-estimator "Target parameter"
sub-sections (plan-review additional-note); the reporting-layer
doc in REPORTING.md is the current source of truth. A follow-up
docs PR can land those sub-sections if maintainers want the
registry to own the canonical wording directly.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TDL77 pushed a commit to TDL77/diff-diff that referenced this pull request May 13, 2026
)

The CI AI reviewer's quality has notably degraded since igerber#404 and igerber#415 landed.
This restores 5 files to the snapshot at fe80295 (parent of igerber#404's merge,
i.e. the last commit on main before either PR landed):

  .github/workflows/ai_pr_review.yml  -- reinstates openai/codex-action@v1
                                         with model: gpt-5.4 and effort: xhigh
  .github/codex/prompts/pr_review.md  -- removes Single-Pass Completeness
                                         Mandate (igerber#404) and Audit igerber#6
                                         "Claim-vs-shipped" + tightened
                                         verdict bar (igerber#415); restores the
                                         original 179-line prompt
  .claude/scripts/openai_review.py    -- drops --ci-mode (igerber#415) and gpt-5.5
                                         PRICING/reasoning entries (igerber#404);
                                         DEFAULT_MODEL back to gpt-5.4
  .claude/commands/ai-review-local.md -- restores skill doc to match the
                                         restored script
  tests/test_openai_review.py         -- restores test suite to match
                                         (152 tests pass at the snapshot)

igerber#404's body documented the rollback criterion ("if >2 of next 5 PRs surface
new P1+ findings on unchanged code in round 2, revert the model bump"). This
invokes that rollback and extends it to also revert the Codex -> Python
single-shot backend switch from igerber#415.

Out of scope: open branch ci-workflow-ipynb-markdown-extraction (PR igerber#414,
unmerged) also touches openai_review.py and pr_review.md; that branch will
need to rebase against the older base after this lands.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
igerber added a commit that referenced this pull request May 20, 2026
…ueue + example refs

One informational P3 from CI codex R2 — METHODOLOGY_REVIEW.md still
described ContinuousDiD as "In Progress" in two surrounding surfaces
even after the status-table flip, creating conflicting status signals.

Fixed both sites:
1. L27 explanatory paragraph: removed the ContinuousDiD example from
   the In Progress band's "has methodology file but no paper review"
   illustration (it's now Complete).
2. L1289-1292 Priority Order queue: removed entry #9 (ContinuousDiD)
   and renumbered the remaining queue.

Retroactive fix per feedback_changelog_accuracy_fixes (CI review
catching one factual error in the queue means scanning for the same
mistake): PR #473 promoted HeterogeneousAdoptionDiD to Complete but
left entry #6 (HAD) in the same In Progress queue. Removed HAD's
entry too and renumbered, so the queue is now self-consistent with
the status table for all Complete entries.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
igerber added a commit that referenced this pull request May 22, 2026
Closes the WooldridgeDiD (ETWFE) methodology-review-tracker
promotion in METHODOLOGY_REVIEW.md (In Progress → Complete),
following the primary-source review for Wooldridge (2025) merged
in PR-A (#484). Adds two paper-driven implementation surfaces and
extends R-parity goldens to the nonlinear paths.

Implementation:
- `aggregate(weights="cohort_share")` on WooldridgeDiDResults
  implements paper Eqs. 7.4 (simple-overall) and 7.6 (event-time,
  restricted to k>=0) cohort-share aggregation weights as an
  opt-in alternative to the default cell-count weighting
  (matching Stata `jwdid_estat`). Inference fields fail-closed to
  NaN with UserWarning per paper Section 7.5 conditional-on-shares
  semantics; raises on `survey_design` (design-consistent totals
  deferred); raises on `type ∈ {"group","calendar"}` (no paper
  closed-form); raises on bootstrap fits (no matching bootstrap
  variant). Closes TODO row 95.
- `cohort_trends=True` on `WooldridgeDiD.__init__` adds linear
  `dg_i · t` cohort-specific trend interactions (paper Section 8
  / Eq. 8.1) for the OLS path. Rejects on logit/poisson per
  paper Section 8 OLS scope; rejects on survey_design pending
  full-dummy/TSL validation; enforces per-cohort pre-period
  identification check (≥ 2 observed pre-periods per treated
  cohort). Auto-routes to full-dummy mode regardless of
  vcov_type. Closes the PR-A Requirements Checklist
  heterogeneous-trends gap.

Tests:
- `tests/test_methodology_wooldridge.py` extended with 6
  paper-equation-numbered methodology classes (Theorem 3.1,
  Proposition 5.1, Section 6 event study, Section 7 aggregation
  paths, Section 8 heterogeneous trends, Section 10 unbalanced
  panels) + `TestW2025LibraryDeviations` consolidating 5 surviving
  deviations. Mirrors the HAD PR #473 precedent.
- Two new R-parity surface classes (`TestWooldridgeParityRPoisson`,
  `TestWooldridgeParityRLogit`) lock the structural surface
  against R `etwfe(family=...)` log-link goldens.
- 209 tests total (60 methodology + 149 R-parity + unit
  regressions).

R Goldens:
- `benchmarks/R/generate_wooldridge_golden.R` extended with
  Poisson + logit DGPs via R `etwfe`; augmented panel CSV
  retains the same seed-generated `y_pois` + `y_logit` columns
  for cross-language reproducibility.
- `benchmarks/R/requirements.R` pins `etwfe >= 0.5.0`.

Tracker promotion:
- METHODOLOGY_REVIEW.md L52 status flip with merge date; detail
  section L583-605 rewritten to the Verified Components / Test
  Coverage / Corrections Made / Deviations / Outstanding
  Concerns template mirroring HAD / ContinuousDiD / DCDH. L27
  example re-pointed; priority queue items #7-#10 renumbered to
  #6-#9.
- REGISTRY.md `## WooldridgeDiD (ETWFE)` extended with
  `### Deviations from the paper / from R / library extensions`
  block consolidating 7 surviving deviations + opt-in notes for
  cohort_share + cohort_trends + survey rejection + bootstrap
  cohort_share rejection contracts.
- CHANGELOG.md `[Unreleased]` `### Added` documents the new
  parameters, R-parity extension, and tracker flip.
- `docs/methodology/papers/wooldridge-2025-review.md`
  Requirements Checklist + Gaps & Uncertainties items 1 + 11
  marked `**Status:** Closed in PR-B`.
- `docs/api/wooldridge_etwfe.rst` updated with weighting-scheme
  notes alongside the existing aggregation table.

Second of two PRs for the WooldridgeDiD methodology-review-tracker
promotion. PR-A merged at e416aed (#484).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request May 22, 2026
…ment criteria

The CI AI reviewer (openai/codex-action@v1) was missing P2 findings the local
single-shot Responses API path consistently catches. Smoking gun on PR igerber#412
SHA 7f7e3d5: CI Codex returned a clean verdict with zero P2 findings; the
single-shot path on the same SHA flagged 3 real P2 gaps (docstring drift,
missing survey-composition tests, REGISTRY cross-doc inconsistency), all
real, two of which the user shipped a fix for in the next commit.

Three coordinated changes:

- Replace the openai/codex-action@v1 step in ai_pr_review.yml with a Python
  step that invokes .claude/scripts/openai_review.py in a new --ci-mode.
  Reuses the proven single-shot architecture instead of a parallel pipeline
  that drops findings. Workflow renamed from "AI PR Review (Codex)" to "AI PR
  Review". Existing <!-- ai-pr-review:codex:* --> comment markers preserved
  for backward compat with historical PR canonical comments.

- Add a directive Audit igerber#6 ("Claim-vs-shipped audit") to the Single-Pass
  Completeness Mandate in pr_review.md and its local-mode substitution in
  openai_review.py. Actively cross-references REGISTRY / CHANGELOG / PR-body
  claims against implementation, tests, public docstrings, rendering
  surfaces, and cross-doc consistency, flagging absences per the deferral
  rule. Required because the original prompt let reviewers enumerate what
  exists without auditing what was claimed but missing.

- Tighten assessment criteria so unmitigated P2 findings produce a "Needs
  changes" verdict (not "Looks good"), with explicit "must enumerate" wording
  preventing silent skips and a targeted carve-out preventing TODO.md from
  laundering shipped-behavior test gaps to P3. Mitigation via REGISTRY.md
  Notes or pre-existing TODO entries remains available for non-shipped-claim
  P2s.

The script's _SUBSTITUTIONS list is split into _LOCAL_FRAMING_SUBSTITUTIONS
(9 PR -> code-change tuples, applied only in local mode) and
_MANDATE_SUBSTITUTIONS (1 tuple, applied to all single-shot uses since
neither has tool access). New CLI flags --ci-mode, --pr-title, --pr-body
plumb PR context into a "## PR Context" prompt section in CI mode.
PR body wrapped in <pr-body untrusted="true">...</pr-body> with a
case/whitespace-tolerant regex stripping any literal closing tags.

Pre-merge verification on PR igerber#412 SHA 7f7e3d5 with the modified script
(--ci-mode --full-registry --context standard --model gpt-5.5): verdict
"Needs changes"; surfaces all 3 of the smoking-gun P2s.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request May 22, 2026
…GELOG H1 scope

- R6 fix left METHODOLOGY_REVIEW.md Deviations item igerber#6 stale (only updated
  the Verified-Components row). Item igerber#6 still said "Eq. 18 linear-trend-
  detrended joint Stute deferred". Rewritten to match the rest of the
  HAD tracker: trends_lin=True is SHIPPED + R-parity-locked in
  test_did_had_parity.py; the methodology-walkthrough file deliberately
  doesn't duplicate that coverage; the Pierce-Schott published-value
  numerical replication is what's waived (Deviations Note igerber#3).

- R6 narrowed the Verified-Components row + class docstring but missed the
  CHANGELOG bullet, which still claimed "joint Stute pre-trends + homogeneity
  H0 fail-to-reject + H1 reject under nonlinear DGP". Narrowed to:
  "H0 fail-to-reject on both surfaces and H1 reject for joint homogeneity
  under a nonlinear DGP" — matches the test file's actual scope.

All 35 methodology tests pass; lint clean.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
HanomicsIMF pushed a commit to HanomicsIMF/diff-diff that referenced this pull request May 22, 2026
Flip the ChaisemartinDHaultfoeuille (DCDH) row from In Progress to
Complete. Adds the Verified Components / Test Coverage / Corrections
Made / Deviations / Outstanding Concerns detail section mirroring the
ContinuousDiD (PR igerber#476) and HAD (PR igerber#473) precedents. Consolidates 7
DCDH deviations from the paper, from R DIDmultiplegtDYN, and library
extensions into a labeled REGISTRY surface per the AI-review
"Documenting Deviations" convention. CHANGELOG [Unreleased] gains a
new Added entry. L27 In Progress example re-pointed to WooldridgeDiD;
L1289 priority-order queue item igerber#6 removed and items igerber#7-igerber#11
renumbered to igerber#6-igerber#10.

No source code changes, no new tests, no new docstrings —
documentation consolidation only.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants